# Import Data Below
medical_student <- read_csv(file = "DataMedTeach.csv", show_col_types = FALSE)
medical_student
## # A tibble: 886 × 20
## id age year sex glang part job stud_h health psyt jspe qcae_cog
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 18 1 1 120 1 0 56 3 0 88 62
## 2 4 26 4 1 1 1 0 20 4 0 109 55
## 3 9 21 3 2 1 0 0 36 3 0 106 64
## 4 10 21 2 2 1 0 1 51 5 0 101 52
## 5 13 21 3 1 1 1 0 22 4 0 102 58
## 6 14 26 5 2 1 1 1 10 2 0 102 48
## 7 17 23 5 2 1 1 0 15 3 0 117 58
## 8 21 23 4 1 1 1 1 8 4 0 118 65
## 9 23 23 4 2 1 1 1 20 2 0 118 69
## 10 24 22 2 2 1 1 0 20 5 0 108 56
## # … with 876 more rows, and 8 more variables: qcae_aff <dbl>, amsp <dbl>,
## # erec_mean <dbl>, cesd <dbl>, stai_t <dbl>, mbi_ex <dbl>, mbi_cy <dbl>,
## # mbi_ea <dbl>
library(ggplot2)
ggplot(medical_student, aes(x = factor(year), y = mbi_ex))+
geom_boxplot() +
labs(x = "Year of Study", y = "Burnout Level")
burnout_by_year <- aggregate(mbi_ex ~ year, data = medical_student, FUN = mean)
burnout_table <- kbl(burnout_by_year, align = "c") %>%
kable_classic(full_width = F) %>%
kable_styling("striped", font_size = 14)
burnout_table
| year | mbi_ex |
|---|---|
| 1 | 17.67755 |
| 2 | 18.46667 |
| 3 | 17.88811 |
| 4 | 16.59350 |
| 5 | 15.32283 |
| 6 | 14.02655 |
plot(medical_student$stud_h, medical_student$mbi_ex, pch = 19, col = "lightblue",
xlab = "Student hours", ylab = "MBI exhaustion")
abline(lm(medical_student$mbi_ex ~ medical_student$stud_h), col = "red", lwd = 3)
cor_val <- round(cor(medical_student$mbi_ex, medical_student$stud_h), 2)
text(x = 25, y = 95, paste("Correlation:", cor_val))
mtext(paste("Correlation:", cor_val), side = 1, line = 2)
I start off by creating a density plot to visualize the data at a first glance, and see whether this question is worth investigating. JSPE scores refer to job satisfaction scores calculated on the JSPE (Jefferson Scale of Physician Empathy) scale through 20 items, each answered on 7-point Likert scale (1 ¼ Strongly Disagree, 7 ¼ Strongly Agree).
# Create a box plot to compare the job satisfaction score between genders
medical_student_i_1 <- medical_student
ggplot(medical_student_i_1, aes(x = jspe, fill = factor(sex))) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("#00BFC4", "#F8766D", "#FFFFB0"), name = "Gender",
labels = c("Male", "Female", "Non-Binary")) +
theme_minimal() +
labs(x = "JSPE Score", y = "Density",
title = "Distribution of JSPE Scores by Gender") +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "top",
legend.text = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
panel.border = element_rect(color="gray30", fill=NA, linewidth=1))
sum(medical_student$sex == 3)
## [1] 5
In order to perform a t-test, we can only do the comparison between two groups. Since I initially based off the question to compare the job satisfaction scores between males and females, and there is only 5 responses from non-binary genders which is not enough to make any statistical statements, the subsequent code for this section excludes non-binary people; however, doing a comparison with non-binary people could also yield interesting results, provided sufficient data.
# Create a box plot to compare the job satisfaction score between males & females.
medical_student_i_2 <- subset(medical_student, sex != 3)
ggplot(medical_student_i_2, aes(x = jspe, fill = factor(sex))) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("#00BFC4", "#F8766D"), name = "Gender",
labels = c("Male", "Female")) +
theme_minimal() +
labs(x = "JSPE Score", y = "Density",
title = "Distribution of JSPE Scores by Gender") +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "top",
legend.text = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
panel.border = element_rect(color="gray30", fill=NA, linewidth=1))
# Conduct two-sample t-test
t.test(jspe ~ sex, data = medical_student_i_2)
##
## Welch Two Sample t-test
##
## data: jspe by sex
## t = -3.3288, df = 490.27, p-value = 0.0009377
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -3.4648501 -0.8927977
## sample estimates:
## mean in group 1 mean in group 2
## 104.8327 107.0116
In summary, the test results indicate that there is a statistically significant difference (p-value ~ 0.001) in mean job satisfaction score between Females and Males, with Female participants reporting higher levels of job satisfaction than Male participants on average.
language_codes <- c("1"="French", "15"="German", "20"="English", "37"="Arab", "51"="Basque", "52"="Bulgarian", "53"="Catalan", "54"="Chinese", "59"="Korean", "60"="Croatian", "62"="Danish", "63"="Spanish", "82"="Estonian", "83"="Finnish", "84"="Galician", "85"="Greek", "86"="Hebrew", "87"="Hindi", "88"="Hungarian", "89"="Indonesian", "90"="Italian", "92"="Japanese", "93"="Kazakh", "94"="Latvian", "95"="Lithuanian", "96"="Malay", "98"="Dutch", "100"="Norwegian", "101"="Polish", "102"="Portuguese", "104"="Romanian", "106"="Russian", "108"="Serbian", "112"="Slovak", "113"="Slovenian", "114"="Swedish", "116"="Czech", "117"="Thai", "118"="Turkish", "119"="Ukrainian", "120"="Vietnamese", "121"="Other")
medical_6 <- medical_student %>%
select(glang, psyt, stai_t, mbi_ex, mbi_cy, mbi_ea) %>%
group_by(glang) %>%
summarise(
'Psychological_Distress_Score' = mean( psyt ),
'Anxiety_Inventory'= mean(stai_t),
'Exhaustion_Burnout'= mean(mbi_ex),
'Cynicism_Burnout' = mean(mbi_cy),
'Efficacy_Burnout' = mean(mbi_ea)
) %>%
mutate('Language_Spoken' = language_codes[as.character(glang)]) %>%
select('Language_Spoken', everything(), -glang)
#table
medical_6 %>%
kbl() %>%
kable_styling()
| Language_Spoken | Psychological_Distress_Score | Anxiety_Inventory | Exhaustion_Burnout | Cynicism_Burnout | Efficacy_Burnout |
|---|---|---|---|---|---|
| French | 0.2329149 | 42.48675 | 16.82008 | 10.019526 | 24.16457 |
| German | 0.1935484 | 42.87097 | 16.38710 | 9.741936 | 24.77419 |
| English | 0.2272727 | 40.54545 | 16.45455 | 10.636364 | 25.09091 |
| Arab | 0.3333333 | 55.66667 | 22.33333 | 9.666667 | 22.66667 |
| Chinese | 0.0000000 | 49.00000 | 26.00000 | 14.000000 | 21.00000 |
| Croatian | 0.0000000 | 54.66667 | 16.00000 | 7.666667 | 25.66667 |
| Spanish | 0.4000000 | 42.40000 | 16.20000 | 9.400000 | 24.40000 |
| Italian | 0.1777778 | 43.24444 | 16.66667 | 10.111111 | 24.22222 |
| Japanese | 0.0000000 | 45.00000 | 13.00000 | 16.000000 | 21.00000 |
| Lithuanian | 0.0000000 | 41.00000 | 15.00000 | 15.000000 | 29.00000 |
| Dutch | 0.0000000 | 49.00000 | 18.00000 | 8.000000 | 21.00000 |
| Portuguese | 0.1481481 | 48.85185 | 18.29630 | 10.925926 | 24.44444 |
| Romanian | 0.2500000 | 38.75000 | 15.50000 | 8.000000 | 26.25000 |
| Russian | 0.1666667 | 45.50000 | 17.16667 | 10.500000 | 25.83333 |
| Serbian | 0.0000000 | 32.00000 | 8.00000 | 5.000000 | 27.00000 |
| Swedish | 0.0000000 | 49.00000 | 24.00000 | 15.000000 | 20.00000 |
| Turkish | 0.5000000 | 47.00000 | 22.00000 | 11.500000 | 25.00000 |
| Vietnamese | 0.0000000 | 53.50000 | 17.00000 | 12.500000 | 21.00000 |
| Other | 0.2307692 | 47.84615 | 18.23077 | 11.153846 | 22.61538 |
# Psychological Distress Scores
ggplot(medical_6[, c(1, 2)], aes(x = Language_Spoken, y = Psychological_Distress_Score, fill = Language_Spoken)) +
geom_bar(stat = "identity") +
labs(x = "Language Spoken", y = "Psychological Distress Score", title = "Psychological Distress Scores by Language Spoken") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") # rotate x-axis labels for better readability
# Anxiety Inventory
ggplot(medical_6[, c(1, 3)], aes(x = Language_Spoken, y = Anxiety_Inventory, fill = Language_Spoken)) +
geom_bar(stat = "identity") +
labs(x = "Language Spoken", y = "Anxiety Inventory", title = "Anxiety Inventory by Language Spoken") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") # rotate x-axis labels for better readability
# Exhaustion Burnout
ggplot(medical_6[, c(1, 4)], aes(x = Language_Spoken, y = Exhaustion_Burnout, fill = Language_Spoken)) +
geom_bar(stat = "identity") +
labs(x = "Language Spoken", y = "Exhaustion Burnout", title = "Exhaustion Burnout Scores by Language Spoken") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") # rotate x-axis labels for better readability
# Cynicism Burnout
ggplot(medical_6[, c(1, 5)], aes(x = Language_Spoken, y = Cynicism_Burnout, fill = Language_Spoken)) +
geom_bar(stat = "identity") +
labs(x = "Language Spoken", y = "Cynicism_Burnout", title = "Cynicism_Burnout Scores by Language Spoken") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") # rotate x-axis labels for better readability
#Efficacy Burnout
ggplot(medical_6[, c(1, 6)], aes(x = Language_Spoken, y = Efficacy_Burnout, fill = Language_Spoken)) +
geom_bar(stat = "identity") +
labs(x = "Language Spoken", y = "Efficacy_Burnout", title = "Cynicism_Burnout Scores by Language Spoken") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") # rotate x-axis labels for better readability
correlations <- medical_student %>%
select(glang, psyt, stai_t, mbi_ex, mbi_cy, mbi_ea)%>%
cor()
# Print correlation matrix
correlations%>%
kbl() %>%
kable_styling()
| glang | psyt | stai_t | mbi_ex | mbi_cy | mbi_ea | |
|---|---|---|---|---|---|---|
| glang | 1.0000000 | -0.0428184 | 0.0918513 | 0.0380150 | 0.0369215 | -0.0016969 |
| psyt | -0.0428184 | 1.0000000 | 0.2932823 | 0.1772418 | 0.1457021 | -0.1625439 |
| stai_t | 0.0918513 | 0.2932823 | 1.0000000 | 0.5304859 | 0.3318845 | -0.4625348 |
| mbi_ex | 0.0380150 | 0.1772418 | 0.5304859 | 1.0000000 | 0.5051998 | -0.4808207 |
| mbi_cy | 0.0369215 | 0.1457021 | 0.3318845 | 0.5051998 | 1.0000000 | -0.5659386 |
| mbi_ea | -0.0016969 | -0.1625439 | -0.4625348 | -0.4808207 | -0.5659386 | 1.0000000 |
medical_7 <- medical_student %>%
select(health, mbi_ea)
#table
medical_7[1:10,] %>%
arrange(desc(health))%>%
kbl() %>%
kable_styling()
| health | mbi_ea |
|---|---|
| 5 | 21 |
| 5 | 23 |
| 4 | 26 |
| 4 | 23 |
| 4 | 27 |
| 3 | 20 |
| 3 | 23 |
| 3 | 16 |
| 2 | 18 |
| 2 | 22 |
# Calculate means by health and academic efficacy
health_efficacy <- medical_7 %>%
group_by(health) %>%
summarize(mean_academic_efficacy = mean(mbi_ea))
#table
health_efficacy %>%
kbl() %>%
kable_styling()
| health | mean_academic_efficacy |
|---|---|
| 1 | 24.62162 |
| 2 | 21.60920 |
| 3 | 22.83824 |
| 4 | 24.24129 |
| 5 | 25.91964 |
# Plot relationship between health and academic efficacy
ggplot(health_efficacy, aes(x = health, y = mean_academic_efficacy)) +
geom_bar(stat = "identity") +
labs(title = "Relationship between Academic Efficacy and Health", x = "Health", y = "Mean Academic Efficacy Score") +
theme_bw()
# Calculate correlation between health and academic efficacy
cor(health_efficacy$health, health_efficacy$mean_academic_efficacy)%>%
kbl() %>%
kable_styling()
| x |
|---|
| 0.4967542 |
ggplot(data = medical_student, aes(x = jspe, y = health, color = "Health")) +
geom_point(size = 3) +
labs(x = "JSPE Scores", y = "Health Scores",
title = "Relationship between JSPE Scores and Health") +
theme(plot.title = element_text(size = 14, face = "bold")) +
scale_color_manual(values = c("#0072B2"))
ggplot(data = medical_student, aes(x = jspe, y = stai_t, color = "State Anxiety")) +
geom_point(size = 3) +
labs(x = "JSPE Scores", y = "State Anxiety Scores",
title = "Relationship between JSPE Scores and State Anxiety") +
theme(plot.title = element_text(size = 14, face = "bold")) +
scale_color_manual(values = c("#E69F00"))
ggplot(data = medical_student, aes(x = jspe, y = amsp, color = "AMSP")) +
geom_point(size = 3) +
labs(x = "JSPE Scores", y = "AMSP Scores",
title = "Relationship between JSPE Scores and AMSP") +
theme(plot.title = element_text(size = 14, face = "bold")) +
scale_color_manual(values = c("#009E73"))
ggplot(data = medical_student, aes(x = jspe, y = stud_h, color = "Study Habits")) +
geom_point(size = 3) +
labs(x = "JSPE Scores", y = "Study Habits Scores",
title = "Relationship between JSPE Scores and Study Habits") +
theme(plot.title = element_text(size = 14, face = "bold")) +
scale_color_manual(values = c("#CC79A7"))
ggplot(data = medical_student, aes(x = factor(amsp), y = stai_t)) +
geom_boxplot( color = "#0072B2", alpha = 0.8, size = 0.8) +
labs(x = "AMSP Scores", y = "State Anxiety Scores",
title = "Relationship between AMSP Scores and State Anxiety")
We can tell from the graph that there is a relative negative trend in the relation between anxiety level and academic motivation. The students with higher academic motivation have a relative lower anxiety level.
partnership_status=
medical_student %>%
select(c(part))
partnership_status
## # A tibble: 886 × 1
## part
## <dbl>
## 1 1
## 2 1
## 3 0
## 4 0
## 5 1
## 6 1
## 7 1
## 8 1
## 9 1
## 10 1
## # … with 876 more rows
ps<-ggplot(partnership_status, aes(x=part)) + geom_histogram(fill='blue',color='black', bins =2)
ps + theme_minimal()
part_plot<-ps + xlab("% in partnership") + ylab("Density") +
ggtitle("Distribution of Med Students in Partnership") +
scale_fill_manual(values=c("red","green")) +
guides(fill=guide_legend(title="part"))+
geom_vline(xintercept = .50) +
geom_hline(yintercept = 1.46)+
theme_minimal()
part_plot + theme_minimal()
health_sat =
medical_student %>%
select(c(health))
hs<-ggplot(health_sat, aes(x=health)) + geom_histogram(fill='blue',color='black', bins=5)
hs + theme_minimal()
plot1=
ggplot(medical_student, aes(x=health, fill =as.factor(part)))+
geom_histogram(bins=5)+
scale_fill_manual(values=c('red','green'))+
facet_wrap(~part)+
labs(fill = "Partnership Status")
plot1 + theme_minimal()
ex_cynic_figure=
ggplot(medical_student,aes(x=mbi_ex,y=mbi_cy))+
geom_point()
ex_cynic_figure
cynic_figure_line <- ex_cynic_figure + geom_abline(aes(intercept = mean(mbi_ex) - mean(mbi_cy),
slope = 1),
linetype = 2, color = "red")
cynic_figure_line
another_one <- cynic_figure_line+ geom_smooth(method = "lm")
another_one
## `geom_smooth()` using formula = 'y ~ x'
compare_to_academics<-
another_one+
facet_wrap(~mbi_ea)
compare_to_academics
## `geom_smooth()` using formula = 'y ~ x'
## Warning in qt((1 - level)/2, df): NaNs produced
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
Our group decided to investigate Q2 and Q4 in further detail.
From the correlation plot that we made for one of the initial
questions, we had notices that efficacy was weakly positively correlated
with stud_h (study hours), health
(self-reported health score), jspe (job satisfaction
score), qcae_cog (Cognitive empathy measured on the QCAE
scale), and amsp (Assessment of Motor and Process Skills).
We also saw a weak negative correlation with psyt (Had a
psychotherapy test in the last year) and qcae_aff
(Affective Empathy score measured on the QCAE scale). However, the most
noteworthy observations were the strong negative correlations that
efficay had with anxiety, depression, exhaustion, and cynicism. (Using
the same terminologies from Interpreter’s section Q2).
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
data <- medical_student_i_3
linear_regression <- function(data, x, y) {
OUTPUT <- ggplot(data, aes(x, y)) +
geom_point() +
geom_smooth(method = "lm", formula = 'y ~ x', se = FALSE) +
theme_minimal()
return (OUTPUT)
}
depression <- linear_regression(data, data$depression, data$efficacy) +
ggtitle("Relation b/w Depression & Efficacy") +
labs(x = "Depression", y = "Efficacy")
anxiety <- linear_regression(data, data$anxiety, data$efficacy) +
ggtitle("Relation b/w Anxiety & Efficacy") +
labs(x = "Anxiety", y = "Efficacy")
exhaustion <- linear_regression(data, data$exhaustion, data$efficacy) +
ggtitle("Relation b/w Exhaustion & Efficacy") +
labs(x = "Exhaustion", y = "Efficacy")
cynicism <- linear_regression(data, data$cynicism, data$efficacy) +
ggtitle("Relation b/w Cynicism & Efficacy") +
labs(x = "Cynicism", y = "Efficacy")
grid.arrange(depression, anxiety, exhaustion, cynicism, ncol=2)
studyhours <- linear_regression(data, data$stud_h, data$efficacy) +
ggtitle("Relation b/w Study Hours & Efficacy") +
theme(plot.title = element_text(hjust = 0.5, size = 8, face = "bold")) +
labs(x = "Study Hours", y = "Efficacy")
jobsatisfaction <- linear_regression(data, data$jspe, data$efficacy) +
ggtitle("Relation b/w Job Satisfaction Score & Efficacy") +
theme(plot.title = element_text(hjust = 0.5, size = 8, face = "bold")) +
labs(x = "Job Satisfaction Score", y = "Efficacy")
health<-ggplot(data, aes(x = as.factor(health), y = efficacy)) +
geom_boxplot() +
ggtitle("Relation b/w Health & Efficacy") +
theme(plot.title = element_text(hjust = 0.5, size = 8, face = "bold"))+
labs(x = "Health", y = "Efficacy")+
theme_minimal()
cogemp <- linear_regression(data, data$qcae_cog, data$efficacy) +
ggtitle("Relation b/w Cognitive Empathy & Efficacy") +
theme(plot.title = element_text(hjust = 0.5, size = 8, face = "bold")) +
labs(x = "Cognitive Empathy Score", y = "Efficacy")
academicmotiv <- linear_regression(data, data$amsp, data$efficacy) +
ggtitle("Relation b/w Academic Motivation & Efficacy") +
theme(plot.title = element_text(hjust = 0.5, size = 8, face = "bold")) +
labs(x = "Academic Motivation", y = "Efficacy")
grid.arrange(studyhours, jobsatisfaction, health, cogemp, academicmotiv, ncol = 3)
#turn sex into a factor to do the plot color division
x<-as.factor(data$sex)
dep.g<-ggplot(data, aes(x = age, y = depression, color= x)) +
geom_point() +
geom_smooth(method = lm, se = FALSE, aes(color = x)) +
scale_color_manual(values = c("1" = "#F8766D", "2" = "#00BFC4"),
breaks = c(1, 2),
labels = c("Male", "Female")) +
labs(color = "Gender", fill = "Gender") +
theme_minimal()
cyn.g <- ggplot(data, aes(x = age, y = cynicism, color = x)) +
geom_point() +
geom_smooth(method = lm, se = FALSE, aes(color = x)) +
scale_color_manual(values = c("1" = "#F8766D", "2" = "#00BFC4"),
breaks = c(1, 2),
labels = c("Male", "Female")) +
labs(color = "Gender", fill = "Gender") +
theme_minimal()
exh.g<-ggplot(data, aes(x = age, y = exhaustion, color= x))+
geom_point() +
geom_smooth(method = lm, se = FALSE, aes(color = x)) +
scale_color_manual(values = c("1" = "#F8766D", "2" = "#00BFC4"),
breaks = c(1, 2),
labels = c("Male", "Female")) +
labs(color = "Gender", fill = "Gender") +
theme_minimal()
anx.g<-ggplot(data, aes(x = age, y = anxiety, color= x))+
geom_point() +
geom_smooth(method = lm, se = FALSE, aes(color = x)) +
scale_color_manual(values = c("1" = "#F8766D", "2" = "#00BFC4"),
breaks = c(1, 2),
labels = c("Male", "Female")) +
labs(color = "Gender", fill = "Gender") +
theme_minimal()
grid.arrange(dep.g,anx.g,cyn.g,exh.g, ncol=2)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
For the following T-tests, males are group 1 and females are group 2. There were only 5 non-binary people, which is not enough data to perform any tests and come up with reasonable statistically significant conclusions.
medical.student.followup.1 <- medical_student_i_2
# Conduct two-sample t-test for depression in males vs females.
t.test(cesd ~ sex, data = medical.student.followup.1)
##
## Welch Two Sample t-test
##
## data: cesd by sex
## t = -7.785, df = 624.6, p-value = 2.906e-14
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -7.397363 -4.417146
## sample estimates:
## mean in group 1 mean in group 2
## 14.00364 19.91089
# Conduct two-sample t-test for anxiety in males vs females.
t.test(stai_t ~ sex, data = medical.student.followup.1)
##
## Welch Two Sample t-test
##
## data: stai_t by sex
## t = -8.1083, df = 542.81, p-value = 3.432e-15
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -8.387794 -5.116257
## sample estimates:
## mean in group 1 mean in group 2
## 38.27273 45.02475
# Conduct two-sample t-test for Exhaustion in Males vs Females
t.test(mbi_ex ~ sex, data = medical.student.followup.1)
##
## Welch Two Sample t-test
##
## data: mbi_ex by sex
## t = -4.8453, df = 519.11, p-value = 1.671e-06
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -2.584502 -1.093325
## sample estimates:
## mean in group 1 mean in group 2
## 15.61818 17.45710
# Conduct two-sample t-test
t.test(mbi_cy ~ sex, data = medical.student.followup.1)
##
## Welch Two Sample t-test
##
## data: mbi_cy by sex
## t = -0.33477, df = 528.33, p-value = 0.7379
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -0.7662659 0.5431276
## sample estimates:
## mean in group 1 mean in group 2
## 9.989091 10.100660
# Conduct two-sample t-test
t.test(mbi_ea ~ sex, data = medical.student.followup.1)
##
## Welch Two Sample t-test
##
## data: mbi_ea by sex
## t = 0.90525, df = 503.26, p-value = 0.3658
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -0.3627671 0.9827131
## sample estimates:
## mean in group 1 mean in group 2
## 24.44364 24.13366
GIVE A 2 PARAGRAPH SUMMARY.
The initial investigation of the questions revealed interesting
results and insights. The dataset was primarily composed of
self-reported survey responses, but it also included scores generated
from established inventory scales such as MBI and
STAI. Most of the columns in the dataset were numerical
but represented categorical groups. For instance,
sex was represented by 0 for males, 1 for females, and 2
for non-binary. Similarly, other variables like glang,
part, job, health, and
psyt were also represented by numerical values.
Surprisingly, the burnout levels decreased for
upperclassmen, which was unexpected as we had anticipated more
burnout with longer time spent in medical school. Furthermore,
women reported higher job satisfaction scores than men,
which was a positive surprise considering the historical gender
disparities in pay. Interestingly, partnership status did not
affect students’ satisfaction with their health. Instead, it
appeared that more people were in partnerships, but their distribution
in health satisfaction was similar. Job satisfaction showed a
weak correlation with other variables such as health, anxiety, academic
motivation, and study habits. The right-skewed trend in job
satisfaction indicated that higher job satisfaction led to higher values
for all remaining predictors. This led to a question about how these
metrics would vary across geographic regions. The investigation into the
effects of other variables on academic efficacy also showed similar
trends. The decrease in burnout levels as students progressed in medical
school led us to question why this was the case. Finally, the
question about gender disparities in job satisfaction led to further
investigation into gender differences across other metrics and whether
any of these differences were statistically significant.
After analyzing the data, we found the second and fourth follow-up questions to be the most promising. As students, we understand that academic efficacy is crucial for future academic or career pursuits, so we decided to investigate the variables that influence it. Additionally, we recognized that the gender wage gap is not the only difference between males and females in medical school, so we explored which metrics differed significantly between genders and how they evolved over time. For the second follow-up, we referred to the correlation plot to study the negative correlations between efficacy and depression, anxiety, exhaustion, and cynicism, creating scatterplots with the line of best fit to examine these correlations. We also investigated weak positive correlations between efficacy and study hours, job satisfaction, health, cognitive empathy, and academic motivation. To explore the fourth follow-up, we created scatterplots to visually represent the relationship between gender and age, depression, anxiety, cynicism, and exhaustion, enabling us to observe performance differences while accounting for age. We used T-tests to discover that there is a statistically significant difference in depression, exhaustion, and anxiety means between males and females but not in efficacy and cynicism. Finally, we discovered that as people age, they are less depressed and anxious, possibly due to a more stable mental state. However, we found that cynicism in females increases with age and plan to investigate this phenomenon further.